\title{Data}

{{navbar}}

\subsubsection{Data}

Data defines a set of observations. There are three ways
to read data in Edward. They follow the
\href{https://www.tensorflow.org/programmers_guide/reading_data}
{three ways to read data in TensorFlow}.

\textbf{Preloaded data.}
A constant or variable in the TensorFlow graph holds all the data.
This setting is the fastest to work with and is recommended if the
data fits in memory.

Represent the data as NumPy arrays or TensorFlow tensors.

\begin{lstlisting}[language=Python]
x_data = np.array([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
x_data = tf.constant([0, 1, 0, 0, 0, 0, 0, 0, 0, 1])
\end{lstlisting}

During inference, we store them in TensorFlow variables internally to
prevent copying data more than once in memory. As an example, see the
\href{http://nbviewer.jupyter.org/github/blei-lab/edward/blob/master/notebooks/getting_started.ipynb}{getting started} notebook.

\textbf{Feeding.}
Manual code provides the data when running each step of inference.
This setting provides the most fine control which is useful for
experimentation.

Represent the data as
\href{https://www.tensorflow.org/programmers_guide/reading_data#feeding}{TensorFlow placeholders},
which are nodes in the graph that are fed at runtime.

\begin{lstlisting}[language=Python]
x_data = tf.placeholder(tf.float32, [100, 25])  # placeholder of shape (100, 25)
\end{lstlisting}

During inference, the user must manually feed the placeholders. At each
step, call \texttt{inference.update()} while
passing in a \texttt{feed\_dict} dictionary
which binds placeholders to realized values as an argument. As an example, see the
\href{https://github.com/blei-lab/edward/blob/master/examples/vae.py}
{variational auto-encoder} script.
If the values do not change over inference updates, one can also bind
the placeholder to values within the \texttt{data} argument when
first constructing inference.

\textbf{Reading from files.}
An input pipeline reads the data from files at the beginning of a
TensorFlow graph. This setting is recommended if the data does not
fit in memory.

\begin{lstlisting}[language=Python]
filename_queue = tf.train.string_input_producer(...)
reader = tf.SomeReader()
...
\end{lstlisting}

Represent the data as TensorFlow tensors, where the tensors are the
output of data readers. During inference, each update will be
automatically evaluated over new batch tensors represented through
the data readers. As an example, see the
\href{https://github.com/blei-lab/edward/blob/master/tests/inferences/test_inference_data.py}{data unit test}.
